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Abstract 

In  this  paper,  we  present  a  general  technique  for  tak¬ 
ing  forward-chaining  planners  for  deterministic  do¬ 
mains  (e.g.,  HSP,  TLPIan,  TALplanner,  and  SH0P2) 
and  adapting  them  to  work  in  nondeterministic  do¬ 
mains.  Our  results  suggest  that  our  technique  pre¬ 
serves  many  of  the  desirable  properties  of  these  plan¬ 
ners,  such  as  the  ability  to  use  heuristic  techniques  to 
achieve  highly  efficient  planning. 

In  our  experimental  studies  on  two  problem  domains, 
the  well-known  MBP  algorithm  took  exponential  time, 
confirming  prior  results  by  others.  A  nondeterminized 
version  of  SHOP2  took  only  polynomial  time.  The 
polynomial-time  figures  are  confirmed  by  a  complex¬ 
ity  analysis,  and  a  similar  complexity  analysis  shows 
that  a  nondeterminized  version  of  TLPIan  would  per¬ 
form  similarly. 

Introduction 

One  of  the  biggest  limitations  of  classical  AI  planning 
is  the  assumption  of  determinism :  classical  planner  as¬ 
sumes  that  it  knows  the  exact  outcomes  of  the  actions, 
so  that  for  any  given  plan  and  initial  state,  the  world 
will  evolve  along  a  single  fully  predictable  path. 

A  more  realistic  assumption  is  that  the  world  is  non¬ 
deterministic:  an  action  may  have  several  possible  out¬ 
comes,  but  we  do  not  know  in  advance  which  one  will 
occur.  Unfortunately,  this  incurs  a  huge  combinatorial 
explosion:  a  plan  may  have  exponentially  many  differ¬ 
ent  execution  paths,  and  the  planning  algorithm  must 
reason  about  all  of  them  in  order  to  find  a  plan  that 
works  despite  the  nondeterminism  in  the  world. 

Prior  approaches  for  planning  in  nondeterminis¬ 
tic  domains  have  included  model-checking  techniques 
(Cimatti  et  al.  2003;  Cimatti,  Roveri,  &  Traverso  1998) 
and  conformant-planning  techniques  (Smith  &  Weld 
1998).  Algorithms  based  on  these  approaches  typically 
examine  most  or  all  of  the  states  in  the  state  space. 

In  deterministic  planning  domains,  much  work  has 
been  done  on  ways  to  improve  the  efficiency  of  planners 
by  preventing  them  from  visiting  unpromising  states. 
This  work  has  been  especially  successful  in  planners 
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based  on  forward  chaining,  such  as  HSP  (Bonet  & 
Geffner  1999),  FF  (Hoffmann  &  Nebel  2001),  TLPIan 
(Bacchus  &  Kabanza  2000),  TALplanner  (Kvarnstrom 
&  Doherty  2001),  and  SHOP2  (Nau  et  al.  2003).  These 
planners  know  the  current  state  at  all  times,  which  fa¬ 
cilitates  the  use  of  some  powerful  pruning  techniques — 
especially  in  hand-tailorable  planners,  in  which  the 
pruning  techniques  may  be  domain-specific. 

In  this  paper,  we  present  a  way  to  transport  these 
efficiency  improvements  over  to  nondeterministic  plan¬ 
ning.  Our  contributions  are  the  following: 

•  A  general  technique  for  “nondeterminizing”  forward¬ 
chaining  planners,  i.e. ,  extending  them  to  work  in 
nondeterministic  domains.  This  technique  preserves 
soundness  and  completeness. 

•  Conditions  under  which  a  nondeterminized  algo¬ 
rithm’s  time  complexity  is  polynomially  bounded  by 
the  time  complexity  of  the  deterministic  version. 

•  Experimental  comparisons  of  N  D-SHOP2  (the  nonde- 
terminization  of  SHOP2)  to  the  well  known  MBP  al¬ 
gorithm,  in  two  different  domains.  The  experimental 
results  show  MBP’s  CPU  time  growing  exponentially 
in  the  size  of  the  problem,  confirming  the  results  in 
(Pistore,  Bettin,  &  Traverso  2001),  and  ND-SHOP2’s 
CPU  time  growing  only  polynomially. 

•  A  complexity  analysis  confirming  the  experimental 
results  for  ND-SHOP2:  its  running  time  grows  at 
@(n4)  in  one  domain  and  @(n5)  in  the  other.  This 
level  of  performance  is  not  restricted  to  ND-SHOP2: 
a  similar  complexity  analysis  shows  that  a  nonde- 
terminization  of  TLPIan  would  have  similar  growth 
rates. 

Definitions  and  Notation 

Our  definition  of  states,  goals,  operators,  and  actions 
(ground  instances  of  operators)  are  the  same  as  in  clas¬ 
sical  planning,  except  that  each  operator  (and  thus  each 
action)  may  have  more  than  one  possible  result.  A  plan¬ 
ning  problem  is  a  triple  P  —  (so,  g,  O),  where  So  is  the 
initial  state,  g  is  the  goal,  and  O  is  the  set  of  oper¬ 
ators.  Recall  that  so  and  g  vary  from  one  problem  to 
another,  and  O  remains  fixed  within  a  planning  domain 


Report  Documentation  Page 

Form  Approved 

OMB  No.  0704-0188 

Public  reporting  burden  for  the  collection  of  information  is  estimated  to  average  1  hour  per  response,  including  the  time  for  reviewing  instructions,  searching  existing  data  sources,  gathering  and 
maintaining  the  data  needed,  and  completing  and  reviewing  the  collection  of  information.  Send  comments  regarding  this  burden  estimate  or  any  other  aspect  of  this  collection  of  information, 
including  suggestions  for  reducing  this  burden,  to  Washington  Headquarters  Services,  Directorate  for  Information  Operations  and  Reports,  1215  Jefferson  Davis  Highway,  Suite  1204,  Arlington 

VA  22202-4302.  Respondents  should  be  aware  that  notwithstanding  any  other  provision  of  law,  no  person  shall  be  subject  to  a  penalty  for  failing  to  comply  with  a  collection  of  information  if  it 
does  not  display  a  currently  valid  OMB  control  number. 

1.  REPORT  DATE 

2004 

2.  REPORT  TYPE 

3.  DATES  COVERED 

00-00-2004  to  00-00-2004 

4.  TITLE  AND  SUBTITLE 

5a.  CONTRACT  NUMBER 

Forward-Chaining  Planning  in  Nondeterministic  Domains 

5b.  GRANT  NUMBER 

5c.  PROGRAM  ELEMENT  NUMBER 

6.  AUTHOR(S) 

5d.  PROJECT  NUMBER 

5e.  TASK  NUMBER 

5f.  WORK  UNIT  NUMBER 

7.  PERFORMING  ORGANIZATION  NAME(S)  AND  ADDRESS(ES) 

University  of  Maryland, Department  of  Computer  Science, College 

Park, MD, 20742 

8.  PERFORMING  ORGANIZATION 

REPORT  NUMBER 

9.  SPONSORING/MONITORING  AGENCY  NAME(S)  AND  ADDRESS(ES) 

10.  SPONSOR/MONITOR'S  ACRONYM(S) 

11.  SPONSOR/MONITOR'S  REPORT 
NUMBER(S) 

12.  DISTRIBUTION/AVAILABILITY  STATEMENT 

Approved  for  public  release;  distribution  unlimited 

13.  SUPPLEMENTARY  NOTES 

14.  ABSTRACT 

15.  SUBJECT  TERMS 

16.  SECURITY  CLASSIFICATION  OF: 

17.  LIMITATION  OF 
ABSTRACT 

18.  NUMBER 

OF  PAGES 

6 

19a.  NAME  OF 
RESPONSIBLE  PERSON 

a.  REPORT 

unclassified 

b.  ABSTRACT 

unclassified 

c.  THIS  PAGE 

unclassified 

Standard  Form  298  (Rev.  8-98) 

Prescribed  by  ANSI  Std  Z39-18 


but  varies  from  one  domain  to  another. 

We  also  use  most  of  the  same  definitions  used  in 
nondeterministic  planning  domains.  j(s,a),  the  state- 
transition  function  induced  by  the  set  of  operators  O, 
gives  the  set  of  all  possible  states  that  may  result  from 
applying  the  action  a  to  the  state  s,  with  y(s,  a)  =  0 
if  a  is  not  applicable  to  s.  The  planning  domain  is 
deterministic  iff  |-y(s,  a)|  <  1  VsVa.  A  policy  is  a  set 
7 r  =  {(sj,ai)}f=1,  where  k  >  0  is  the  size  of  the  pol¬ 
icy.  The  set  of  states  in  7r  is  Sn  =  {s  |  (s,  a)  G  7r}. 
We  require  that  for  any  state  s,  there  is  at  most  one 
action  a  such  that  (s,  a)  G  ir.  The  execution  structure 
for  7r  is  a  directed  graph  whose  nodes  are  all  of  the 
states  that  can  be  reached  by  executing  actions  in  7 r, 
and  whose  edges  represent  the  possible  state  transitions 
caused  by  actions  in  tv.  If  there  is  a  path  in  from 
si  to  S2,  then  we  say  that  si  is  a  7r-ancestor  of  S2  and 
S2  is  a  7r-descendant  of  si.  We  use  the  usual  definitions 
(Cimatti  et  al.  2003)  of  weak,  strong,  and  strong-cyclic 
solutions  for  planning  problems. 

Researchers  have  often  defined  nondeterministic  ver¬ 
sions  of  well-known  deterministic  problems;  we  for¬ 
malize  this  notion  as  follows.  A  planning  problem 
P'  =  (So,  <7,  O')  is  a  nondeterministic  version  of  a  deter¬ 
ministic  planning  problem  P  =  (so,  g,  O)  if  so  G  So  and 
O  is  identical  to  O'  except  that  each  operator  in  O'  may 
have  additional  sets  of  effects.  These  additional  effects 
can  be  used  to  model  action  failures  (e.g.,  a  gripper 
drops  what  it  is  holding)  and  exogenous  events  (e.g.,  a 
road  is  closed). 

Nondeterminizing  Forward  Planners 

We  now  describe  a  general  technique  for  taking  forward¬ 
chaining  planners  for  deterministic  planning  domains, 
and  nondeterminizing  them,  i.e.,  translating  them  into 
planners  that  find  strong-cyclic  solutions  in  nondeter¬ 
ministic  domains.  A  policy  7r  is  a  strong-cyclic  solution 
if  every  (finite  or  infinite)  path  in  £„  can  be  extended  to 
a  finite  execution  path  that  reaches  to  a  goal  state.  We 
have  also  developed  planners  that  produce  weak  and 
strong  solutions;  see  (Kuter  &  Nau  2004)  for  details. 

The  basis  of  our  approach  is  an  abstract  planning 
procedure  FCP  for  deterministic  domains,  and  a  corre¬ 
sponding  procedure  ND-FCP  for  nondeterministic  do¬ 
mains.  Both  are  described  below.  If  a  planner  A  can 
be  described  as  an  instance  of  FCP,  then,  ND-A,  the 
corresponding  instance  of  ND-FCP,  will  be  the  strong- 
cyclic  nondeterminization  of  A.  Using  this  technique, 
we  have  written  strong-cyclic  nondeterminizations  of 
TLPIan  (Bacchus  &  Kabanza  2000),  SHOP2  (Nau  et 
al.  2003),  TALplanner  (Kvarnstrom  &  Doherty  2001), 
and  several  other  forward  planners  (see  (Kuter  &  Nau 
2004)). 

As  shown  in  Figure  1,  FCP  starts  with  an  initial  state 
so,  and  explores  other  states  by  successively  choosing 
and  applying  planning  operators  until  it  reaches  a  state 
that  satisfies  the  goal  formula  g.  The  action- selection 
function  a(s)  is  used  to  prune  the  search  space.  It  re¬ 
turns  zero  or  more  actions,  which  we  will  call  the  ac- 


Procedure  FCP(so,  g,  O,  a)\ 

7t  <—  0;  s  <—  So 

loop 

if  s  satisfies  g  then  return(7r) 

A  <—  { (s,  a)  |  a  is  a  ground  instance  of  an  operator 
in  O,  a  is  applicable  to  s,  and  a  G  a(s)} 
if  A  =  0  then  return  (failure) 
nondeterministically  choose  (s,  a)  G  A 

tt  < —  7r  U  { (s,  a)} 

s  <-  j(s,  a) _ 

Figure  1:  An  abstract  forward-chaining  planning  pro¬ 
cedure  for  deterministic  planning  domains. 


Procedure  ND-FCP(So,  g,  O',  a') 

77  < —  0;  S  < —  Sq]  solved  <—  0 

loop 

if  S'  =  0  then  return(7r) 
select  a  state  s  G  S  and  remove  it  from  S 
if  s  satisfies  g  then  insert  s  into  solved 
else  if  s  Sr;  then 

A  <—  { (s,  a)  |  a  is  a  ground  instance  of  an  opera¬ 
tor  in  O',  a  is  applicable  to  s,  and 

a  G  a'(s)} 

if  A  =  0  then  return  (failure) 
nondeterministically  choose  ( s,a )  G  A 

77  < —  7T  U  { (s,  a)} 

S  < —  S  U  q(s,  a) 

else  if  s  has  no  7r-descendants  in  (S  U  solved)  \  S „ 
then  return  (failure) 

Figure  2:  Nondeterminization  of  FCP.  The  under¬ 
lines  indicate  how  the  coding  from  FCP  is  embedded 

in  ND-FCP. 

tions  that  satisfy  a(s).  Rather  than  trying  all  of  the 
actions  applicable  to  s,  FCP  only  tries  the  ones  that  are 
applicable  and  satisfy  a. 

For  example,  in  SHOP2,  the  search  is  controlled  by 
a  set  of  methods  that  decompose  tasks  into  subtasks, 
and  a(s)  is  the  set  of  all  actions  a  applicable  to  s  such 
that  a  can  be  produced  by  applying  methods  to  the 
current  task  network.  In  TLPIan  (Bacchus  &  Kabanza 
2000),  the  search  is  controlled  by  a  formula  cf>  written 
in  a  modal  logic  called  Linear  Temporal  Logic  (LTL). 
For  each  state  s,  a(s)  is  the  set  of  all  actions  a  applica¬ 
ble  to  s  such  that  the  successor  state  7 (s,  a)  satisfies  a 
progressed  formula ,  Progresses ,  <j> ). 

Figure  2  shows  ND-FCP,  the  strong-cyclic  nondeter¬ 
minization  of  FCP.  Like  FCP,  ND-FCP  plans  forward 
from  So,  but  the  success  criterion  is  more  complicated. 
In  FCP,  a  plan  7r  is  a  solution  if  its  final  state  satisfies 
the  goal  condition  g.  In  order  for  a  policy  in  ND-FCP 
to  be  a  solution,  every  state  in  7r’s  execution  structure 
must  have  at  least  one  7r-descendant  that  satisfies  the 
goal  (a  solved  state  in  the  pseudocode). 

ND-FCP’s  action-selection  function  is  analogous  to 


FCP’s,  and  the  action-selection  function  for  a  determin¬ 
istic  domain  can  generally  be  adapted  to  work  in  non- 
deterministic  versions  of  the  domain.  As  an  example, 
recall  that  for  TLPIan,  the  action-selection  function  a(s) 
for  a  planning  domain  T>  returns  all  actions  applicable 
to  s  such  that  7 (s,a)  satisfies  Progresses,  <j>),  where  <j> 
is  a  search-control  formula.  If  T>'  is  a  nondeterministic 
version  of  V,  then  one  possible  action-selection  func¬ 
tion  for  ND-TLPIan  is  c/(s)  =  {all  actions  a  such  that 
at  least  one  state  in  7 (s,a)  satisfies  Progress(s,  <j>)}. 
This  trivial  adaptation  of  a  works  correctly,  but  usu¬ 
ally  ND-TLPIan  will  perform  more  efficiently  if  we  write 
an  action-selection  function  that  returns  only  some  of 
the  actions  in  a'(s). 

Theoretical  Properties 

It  is  not  hard  to  show  that  FCP  and  ND-FCP  are  both 
sound  (i.e. ,  they  do  not  return  any  plan  that  is  not 
a  solution),  and  that  they  are  conditionally  complete 
in  the  sense  that  they  can  find  every  solution  whose 
actions  satisfy  the  action-selection  function. 

We  now  establish  an  upper  bound  on  the  time  com¬ 
plexity  of  a  nondeterminized  planning  algorithm  in  a 
strongly  connected  planning  domain.  A  planning  do¬ 
main  is  strongly  connected  if  for  any  two  states  s  and 
s',  there  exists  a  sequence  of  actions  that  reaches  to 
s'  when  executed  in  s.  Such  domains  are  not  hard  to 
find:  some  well-known  examples  from  previous  planning 
competitions  include  Blocks-World,  Logistics,  DriverLog, 
ZenoTravel,  Depot,  and  Rover.  Any  nondeterminization 
of  such  a  domain  will  also  be  strongly  connected. 

Theorem  1  Let  A  be  an  instance  of  FCP.  Suppose 
A  finds  solution  plans  in  time  0(p(|7r|))  in  strongly- 
connected  planning  domains,  where  |7r|  is  the  size  of 
the  solution  plan  and  p  is  a  monotonic  function.  Then 
ND-A  finds  solution  policies  in  time  0(p(|Ew/|)),  where 
|Eot/|  is  the  size  of  execution  structure  for  the  solution 
policy  returned  by  ND-A. 

Corollary  1  Under  the  conditions  of  Theorem  1,  if  the 
number  of  possible  successors  of  each  state  is  bounded 
by  a  constant,  then  N  D- A  finds  solution  policies  in  time 
0{p{\'K,\)),  where  \k'\  is  the  size  of  the  solution  policy. 

Note  that  every  path  in  ST/  can  be  extended  to  a 
finite  execution  path  that  reaches  to  a  goal  state  in  a 
strongly-connected  domain.  In  domains  that  are  not 
strongly  connected,  however,  may  contain  states 
that  have  no  Tr'-descendants  that  satisfy  the  goals.  We 
call  such  states  as  the  dead-end  states  of  S 

The  following  complexity  result  holds  even  in  plan¬ 
ning  domains  that  are  not  strongly  connected: 

Theorem  2  Let  A  be  an  instance  of  FCP,  and  suppose 
that  A’s  running  time  is  0(p(\ir\),  where  |7r|  is  the  size 
of  the  solution  plan  returned  by  A,  and  p  is  a  monotonic 
function.  Then  ND-A  finds  solution  policies  in  average 
time  0(p(n )  +  where  n  =  |E^/|  is  the  size  of  the 

execution  structure  for  the  solution  policy  7 r'  returned  by 
ND-A,  x  the  maximum  number  of  state- action  pairs  that 


are  added  to  any  policy  after  ND-A  visits  a  dead-end 
state,  y  is  the  maximum  number  of  actions  applicable 
to  any  state,  and  in  every  state  s,  0  <  d  <  y  is  the 
maximum  number  of  actions  applicable  to  s  that  lead  to 
a  dead-end  state. 

Corollary  2  Under  the  conditions  of  Theorem  2,  if  the 
number  of  possible  successors  of  each  state  is  bounded  by 
a  constant,  then  ND-A  finds  solutions  in  average  time 
0{p{W\)+  xd\y  I )),  where  |7r'|  is  the  size  of  the  solution. 

Experiments  and  Complexity  Analysis 

In  order  to  perform  an  empirical  evaluation  of  our  theo¬ 
retical  results,  we  implemented  ND-SH0P2,  the  nonde¬ 
terminized  version  of  SH0P2.  We  compared  its  perfor¬ 
mance  and  scalability  for  reachability  goals  with  MBP 
(Bertoli  et  al.  2001),  a  well-known  planning  system 
for  nondeterministic  domains  that  is  based  on  symbolic 
model  checking  techniques.  In  our  comparisons,  we 
used  MBP’s  Local  Search  algorithm  for  finding  strong- 
cyclic  plans.1  All  of  our  experiments  were  run  on  an 
AMD  Duron  900Mhz  laptop  computer  running  Linux 
RedHat  7.2  with  256MB  memory. 

Our  first  experimental  domains  was  the  Robot  Nav¬ 
igation  domain  that  was  used  for  experimental  evalu¬ 
ation  of  MBP  in  (Pistore,  Bettin,  &  Traverso  2001). 
This  domain  is  a  variant  of  a  similar  domain  described 
in  (Kabanza,  Barbeau,  &  St-Denis  1997).  It  consists 
of  a  building  with  8  rooms  connected  by  7  doors.  In 
the  building,  there  is  a  robot  and  there  are  a  number  of 
packages  in  various  rooms.  The  robot  is  responsible  for 
delivering  packages  from  their  initial  locations  to  their 
final  locations  by  opening  and  closing  doors,  moving 
between  rooms,  and  picking  up  and  putting  down  the 
packages.  The  robot  can  hold  at  most  one  package  at 
any  time.  To  add  nondeterminism,  the  domain  also  in¬ 
volves  a  “kid”  that  can  close  any  of  the  open  doors  that 
are  designated  initially  as  “kid-doors.” 

We  compared  ND-SHOP2  and  MBP  with  the  same 
set  of  experimental  parameters  as  in  (Pistore,  Bettin,  & 
Traverso  2001):  the  number  of  packages  n  ranged  from 
1  to  5,  and  the  number  of  kid-doors  k  ranged  from  0 
to  7.  For  each  combination  of  n  and  k,  we  generated 
20  random  problems,  ran  the  planners  on  the  problems, 
and  averaged  the  CPU  time  for  each  planner.2 

Figure  3  shows  the  results  for  k  =  7  and  n  =  1, . . . ,  5; 
this  illustrates  the  behavior  of  the  algorithms  as  the  size 
of  the  domain  increases. 

Our  experimental  results  confirm  the  ones  in  (Pis¬ 
tore,  Bettin,  &  Traverso  2001):  in  both  their  experi¬ 
ments  and  ours,  MBP’s  CPU  time  grows  exponentially 
(the  logarithm  of  the  data  grows  linearly)  in  the  size 

xWe  also  tried  using  MBP’s  Global  Search  algorithm,  but 
the  Local  Search  algorithm  ran  faster. 

2 As  in  (Pistore,  Bettin,  &  Traverso  2001),  the  CPU  time 
for  MBP’s  includes  both  its  preprocessing  and  search  times. 
Omitting  the  preprocessing  times  would  not  have  signifi¬ 
cantly  affected  the  results:  they  were  never  more  than  a  few 
seconds,  and  usually  below  one  second. 


Number  of  Packages  (#kid-doors  =  7) 

Figure  3:  Average  running  times  of  ND-SH0P2  and 
MBP  on  Robot-Navigation  problems,  as  a  function  of 
the  number  of  packages.  The  number  of  kid-doors  in 
the  domain  is  fixed  to  7. 

of  the  problem.  In  contrast,  the  data  for  ND-SH0P2 
show  its  CPU  time  growing  polynomially,  at  only  about 
@(n4).  Using  the  results  in  the  previous  section,  we 
have  confirmed  that  this  is  the  correct  growth  rate  for 
ND-SH0P2  in  this  domain. 

To  illustrate  the  scalability  of  the  algorithms  as  the 
amount  of  nondeterminism  increases,  Figure  4  shows 
the  results  for  n  =  5  and  k  =  1, ...  ,7.  In  each  case, 
MBP  takes  one  to  two  orders  of  magnitude  more  time 
than  ND-SH0P2.  The  closest  runtimes  occurred  at  k  = 
4,  where  MBP  is  about  15  times  slower  than  ND-SH0P2. 

The  reason  for  ND-SHOP2’s  fast  performance  relative 
to  MBP  is  as  follows.  In  the  representation  language  for 
the  Robot  Navigation  domain,  a  policy  can  say  things 
along  the  lines  of  “if  we  are  at  door  number  3  and  it  is 
open,  then  go  through  it,”  rather  than  having  to  give 
explicitly  all  of  the  exponentially  many  states  of  the 
world  in  which  we’re  at  door  number  3  and  the  door  is 
open.  Because  of  this,  problems  in  the  robot-navigation 
domain  have  strong-cyclic  solutions  of  linear  size,  and 
these  are  the  solutions  that  ND-SH0P2  finds.  Although 
MBP  represents  policies  in  a  similar  way,  it  apparently 
does  not  exploit  this  representation  well  enough  to  pro¬ 
duce  policies  of  polynomial  size. 

We  also  compared  ND-SH0P2  and  MBP  on  a  nonde- 
terministic  version  of  the  classical  Blocks  World  domain. 
We  introduced  nondeterminism  by  allowing  the  opera¬ 
tors  to  have  three  kinds  of  outcomes:  (1)  their  tradi¬ 
tional  effects,  (2)  an  operator  may  fail  to  change  the 
state  at  all,  and  (3)  an  operator  may  fail  by  dropping 
the  block  onto  the  table. 

Figure  5  shows  the  results  of  our  experiments.  Like 
before,  each  data  point  is  the  average  of  20  random 
problems.  For  MBP,  there  are  no  data  points  for  n  >  8 
because  it  was  unable  to  solve  any  problems  within  the 
alloted  time  (30  minutes  per  problem).  The  logarithm 
of  MBP’s  CPU  time  is  linear;  thus  on  this  problem,  like 
the  other  one,  MBP  takes  exponential  time. 
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Figure  4:  Average  running  times  of  ND-SHOP2  and 
MBP  on  Robot-Navigation  problems,  as  a  function  of 
the  number  of  kid-doors.  The  number  of  packages  in 
the  domain  is  fixed  to  5. 


Curve-fitting  on  ND-SHOP2’s  running  time  shows  it 
growing  at  only  about  0(n5),  and  a  complexity  analysis 
confirms  this  polynomial  behaviour.  The  reasons  are 
similar  to  the  ones  earlier:  each  time  an  operator  puts 
a  block  in  the  wrong  place,  ND-SHOP2’s  methods  tell 
it  to  pick  it  up  and  try  again.  As  a  result,  ND-SHOP2 
is  guaranteed  to  produce  a  policy  of  size  0(n)  where 
n  is  the  number  of  blocks.  In  contrast,  MBP  produces 
exponential-size  policies  that  tell  what  to  do  in  most  of 
the  states  of  the  world. 

It  would  also  be  possible  to  write  nondeterministic 
versions  of  the  blocks  world  with  more  complicated 
kinds  of  nondeterminism:  for  example,  an  operator 
could  drop  a  block  not  just  onto  the  table,  but  onto 
any  clear  block.  We  have  not  yet  tested  this  case  ex¬ 
perimentally,  but  our  complexity  analysis  shows  that 
such  an  experimental  study  would  yield  results  simi¬ 
lar  to  the  above.  ND-SHOP2  would  take  polynomial 
time  and  space,  although  the  space  would  this  time  be 
quadratic  rather  than  linear  (it  would  immediately  pick 
up  the  fallen  block  again,  but  in  the  worst  case  there 
would  be  0(n)  different  places  to  pick  up  this  block 
from).  MBP  would  take  exponential  time  in  the  size  of 
the  problem,  for  the  same  reasons  as  before. 

Although  we  have  not  implemented  our  pseudocode 
for  ND-TLPIan,  the  nondeterminization  of  TLPIan,  we 
have  analyzed  its  complexity  on  the  planning  domains 
discussed  in  this  section.  The  analysis  shows  that  with 
appropriate  control  rules,  it  would  perform  similarly  to 
ND-SHOP2  in  these  domains.  The  details  of  our  com¬ 
plexity  analyses  can  be  found  in  (Kuter  &  Nau  2004). 

Related  Work 

One  of  the  first  approaches  for  planning  in  nondeter¬ 
ministic  domains  is  the  idea  of  conditional  planning , 
which  uses  an  extended  form  of  STRIPS  operators  to 
have  mutually-exclusive  and  conditional  effects  (Pryor 
&  Collins  1996;  Peot  &  Smith  1992).  This  approach 
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Figure  5:  Average  running  times  of  ND-SH0P2  and 
MBP  in  the  nondeterministic  Blocks- World  domain,  as 
a  function  of  the  number  of  blocks. 


does  not  scale  up  for  complex  planning  domains. 

The  best-known  approach  for  planning  in  nondeter¬ 
ministic  domains  is  based  on  symbolic  model- checking 
techniques  (Cimatti  et  al.  2003;  Cimatti,  Roveri, 
&  Traverso  1998;  Daniele,  Traverso,  &  Vardi  1999; 
Pistore,  Bettin,  &  Traverso  2001).  These  works  es¬ 
tablish  the  notions  of  weak,  strong,  and  strong-cyclic 
solution  and  algorithms  based  on  symbolic  techniques 
for  finding  such  solution  to  nondeterministic  planning 
problems.  MBP  (Bertoli  et  al.  2001)  is  an  implementa¬ 
tion  of  the  ideas  presented  in  these  works. 

Planning  based  on  Markov  Decision  Processes 
(MDPs)  (Boutilier,  Dean,  &  Hanks  1999)  also  has  ac¬ 
tions  with  more  than  one  possible  outcome,  but  mod¬ 
els  the  possible  outcomes  using  probabilities  and  util¬ 
ity  functions,  and  formulates  the  planning  problem  as 
an  optimization  problem.  In  MDPs,  a  policy  is  a  total 
function  from  states  to  actions,  whereas  model-checking 
approaches  allow  a  policy  to  be  partial.  For  prob¬ 
lems  that  can  be  solved  either  by  MDPs  or  by  model¬ 
checking-based  planners,  the  latter  have  been  empiri¬ 
cally  shown  to  be  more  efficient  (Bonet  &  Geffner  2001). 

Satisfiability  and  planning-graph  based  approaches 
have  been  extended  to  do  planning  in  nondeterminis¬ 
tic  domains.  To  the  best  of  our  knowledge,  the  non¬ 
deterministic  satisfiability-based  planners  (Castellini, 
Giunchiglia,  &  Tacchella  2003;  Ferraris  &  Giunchiglia 
2000)are  limited  to  conformant  planning,  where  the 
planner  has  nondeterministic  actions  and  no  observ¬ 
ability.  The  planning-graph  based  techniques  can  ad¬ 
dress  conformant  planning  and  a  limited  form  of  partial- 
observabilitiy  (Smith  &  Weld  1998;  Weld,  Anderson,  & 
Smith  1998). 

In  deterministic  planning  domains,  a  key  property  of 
forward-chaining  planners  is  that  they  know  the  cur¬ 
rent  state  at  all  times.  This  allows  them  to  incorpo¬ 
rate  very  powerful  pruning  techniques —  particularly  in 
hand-t adorable  planners,  in  which  the  pruning  tech¬ 


niques  can  be  domain-specific.  The  domain-specific 
knowledge  can  be  encoded  into  a  planner  in  different 
forms.  For  example,  HSP  (Bonet  &  Geffner  1999)  uses 
a  heuristic  function  that  allows  the  planner  to  choose 
the  best  successor  state  that  can  be  reached  by  applying 
an  action  to  the  current  state.  Another  heuristic-based 
planner,  FF  (Hoffmann  &  Nebel  2001),  computes  the 
heuristic  values  over  the  states  by  running  a  variation 
of  the  GraphPlan  algorithm  (Blum  &  Furst  1997)  on  a 
relaxed  instance  of  the  input  planning  problems.  TLPIan 
(Bacchus  &  Kabanza  2000)  and  TALplanner  (Kvarn- 
strom  &  Doherty  2001)  use  different  kinds  of  temporal- 
logic  formulas.  In  SHOP2  (Nau  et  al.  2003),  domain- 
specific  knowledge  is  encoded  as  task  networks  which 
SHOP2  decomposes  recursively  until  it  reaches  a  task 
network  in  which  all  tasks  can  be  directly  executed. 

Two  planning  approaches  that  can  exploit  search  con¬ 
trol  in  nondeterministic  domains  are  RTDP  (Bonet  & 
Geffner  2000),  which  is  based  on  greedy-search  tech¬ 
niques,  and  SIM  PLAN  (Kabanza,  Barbeau,  &  St-Denis 
1997).  SIMPLAN  was  developed  for  planning  with  reac¬ 
tive  agents  in  nondeterministic  domains  for  temporally- 
extended  goals.  We  did  not  consider  SIMPLAN  for  our 
experiments  since  to  the  best  of  our  knowledge  it  cannot 
produce  strong-cyclic  solutions. 

Conclusions 

During  the  past  few  years,  some  very  efficient  forward¬ 
chaining  planners  have  been  developed  for  planning  in 
deterministic  domains;  there  are  many  cases  in  which 
these  planners  can  run  in  polynomial  time.  The  goal 
of  this  paper  has  been  to  translate  these  advances  in 
efficiency  over  to  nondeterministic  planning  domains. 

We  have  presented  a  general  technique  for  taking 
forward-chaining  planners  for  deterministic  domains 
(e.g.,  HSP,  TLPIan,  TALplanner,  and  SHOP2),  and  non- 
determinizing  them  to  work  in  nondeterministic  do¬ 
mains.  The  nondeterminization  technique  preserves 
soundness  and  completeness. 

There  are  significant  classes  of  nondeterministic  plan¬ 
ning  problems  in  which  the  number  of  possible  states 
is  exponential  but  there  are  solutions  whose  execution 
structures  have  polynomial  size.  Our  theoretical  results 
suggest  that  in  such  domains,  nondeterminizations  of 
efficient  deterministic  planners  may  be  able  to  do  well. 

The  theoretical  results  are  confirmed  by  our  experi¬ 
ments  and  complexity  analyses  on  two  different  problem 
domains.  One  of  of  these  (Robot  Navigation)  has  been 
used  in  several  previous  studies  of  planning  in  nondeter¬ 
ministic  environments.  In  both  problem  domains,  our 
experimental  data  show  MBP’s  running  time  growing 
exponentially  and  ND-SHOP2’s  growing  only  polynomi- 
ally.  A  complexity  analysis  for  ND-SHOP2  confirms  the 
latter  result,  and  a  complexity  analysis  for  ND-TLPIan 
shows  that  it  would  perform  similarly  to  ND-SHOP2  on 
these  problems. 

In  the  near  future,  we  intend  to  improve  our  imple¬ 
mentation  of  ND-SHOP2:  for  example,  its  implementa¬ 
tion  of  ND-FCP’s  reachability  tests  is  rather  naive  and 


can  be  done  much  more  efficiently.  We  intend  to  test 
our  approach  on  additional  planning  domains,  extend  it 
to  deal  with  temporally  extended  goals,  and  perhaps  to 
implement  nondeterminizations  of  additional  planners. 

In  the  longer  term,  we  hope  to  extend  the  theory  of 
nondeterministic  planning  domains  to  encompass  sev¬ 
eral  of  the  kinds  of  problem  features  that  can  be  han¬ 
dled  by  planners  such  as  SH0P2  and  TLPIan,  such  as 
numeric  computations,  axiomatic  inference,  and  calls  to 
external  information  sources  and  software  packages.  We 
also  hope  to  develop  a  technique  for  translating  plan¬ 
ners  for  deterministic  domains  into  planners  for  proba¬ 
bilistic  domains. 
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